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Abstract. Preconditioned eigenvalue solvers (eigensolvers) are gaining popularity, but their con- 
vergence theory remains sparse and complex. We consider the simplest preconditioned cigensolver — 
the gradient iterative method with a fixed step size — for symmetric generalized eigenvalue problems, 
where we use the gradient of the Raylcigh quotient as an optimization direction. A sharp convergence 
rate bound for this method has been obtained in 2001-2003. It still remains the only known such 
bound for any of the methods in this class. While the bound is short and simple, its proof is not. 
We extend the bound to Hermitian matrices in the complex space and present a new self-contained 
and significantly shorter proof using novel geometric ideas. 
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1. Introduction. We consider a generalized eigenvalue problem (eigenproblem) 
for a linear pencil B — [iA with symmetric (Hermitian in the complex case) matrices A 
and B with positive definite A. The eigenvalues \ii are enumerated in decreasing order 
Ml > . . . > Mmin and the Xi denote the corresponding eigenvectors. The largest value 
of the Raylcigh quotient fi(x) = (x,Bx)/(x,Ax), where (-, •) denotes the standard 
scalar product, is the largest eigenvalue fi\. It can be approximated iteratively by 
maximizing the Rayleigh quotient in the direction of its gradient, which is proportional 
to (B — n(x)A)x. Preconditioning is used to accelerate the convergence; see, e.g., 
[2, 4, 5, 6, 8] and the references therein. Here we consider the simplest preconditioned 
eigenvalue solver (eigensolver) — the gradient iterative method with an explicit formula 
for the step size, cf. [2], one step of which is described by 

(L1) X ' = X+ WT1I~ nBx-l*{x)Ax) t M*) = -g||- 

The symmetric (Hermitian in the complex case) positive definite matrix T in (1.1) is 
called the preconditioner. Since A and T are both positive definite, we assume that 

(1.2) (1 - 7 )(z, T~ x z) < (z, Az) < (1 + 7)(z, T~ 1 z), Vz, for a given 7 e [0, 1). 

The following result is proved in [8, 9, 10] for symmetric matrices in the real space. 
Theorem 1.1. If Hi+i < fJ,(x) < m then n{x') > (J,(x) and 

o\ Hi-(i(x') 2 (M-fJ,(x) m-fx i+1 
(1-3) — < cr — , (7 = 1- (1-7) . 

(1(X') - fJ, l+ l H[X) - fii+l fJ>i - Mmin 

The convergence factor a cannot be improved with the chosen terms and assumptions. 
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Compared to other known non- asymptotic convergence rate bounds for similar 
preconditioned eigensolvers, e.g., [1, 2, 4. 5], the advantages of (1.3) are in its sharp- 
ness and elegance. Method (1.1) is the easiest preconditioned eigcnsolver, but (1.3) 
still remains the only known sharp bound in these terms for any of preconditioned 
eigensolvers. While bound (1.3) is short and simple, its proof in [8] is quite the op- 
posite. It covers only the real case and is not self-contained — in addition it requires 
most of the material from [9, 10]. Here we extend the bound to Hermitian matrices 
and give a new much shorter and self-contained proof of Theorem 1.1, which is a great 
qualitative improvement compared to that of [8, 9, 10]. The new proof is not yet as 
elementary as we would like it to be; however, it is easy enough to hope that a similar 
approach might be applicable in future work on preconditioned eigensolvers. 

Our new proof is based on novel techniques combined with some old ideas of 
[3, 9, 10]. We demonstrate that, for a given initial eigenvector approximation x, 
the next iterative approximation x' described by (1.1) belongs to a cone if we apply 
any preconditioner satisfying (1.2). We analyze a corresponding continuation gradi- 
ent method involving the gradient flow of the Rayleigh quotient and show that the 
smallest gradient norm (evidently leading to the slowest convergence) of the contin- 
uation method is reached when the initial vector belongs to a subspace spanned by 
two specific eigenvectors, namely Xi and aij+i. This is done by showing that Temple's 
inequality, which provides a lower bound for the norm of the gradient V/i(x), is sharp 
only in spanja^, Xi + i}. Next, we extend by integration the result for the continuation 
gradient method to our actual fixed step gradient method to conclude that the point 
on the cone, which corresponds to the poorest convergence and thus gives the guaran- 
teed convergence rate bound, belongs to the same two-dimensional invariant subspace 
spanjxi, Xi+i}. This reduces the convergence analysis to a two-dimensional case for 
shifted inverse iterations, where the sharp convergence rate bound is established. 

2. The proof of Theorem 1.1. We start with several simplifications: 

Theorem 2.1. We can assume that r y>0,A~I,B>0is diagonal, eigenvalues 
are simple, n(x) < fa, and fJ,(x') < fa in Theorem 1.1 without loss of generality. 

Proof. First, we observe that method (1.1) and bound (1.3) are evidently both 
invariant with respect to a real shift s if we replace the matrix B with B + sA, so 
without loss of generality we need only consider the case ^t m i n = which makes B > 0. 
Second, by changing the basis from coordinate vectors to the eigenvectors of A~ l B 
we can make B diagonal and A = I. Third, having fx(x') > fa[x) if n(x) = fa or 
nix') > fa, or both, bound (1.3) becomes trivial. The assumption 7 > is a bit 
more delicate. The vector x' depends continuously on the preconditioner T, so we 
can assume that 7 > and extend the final bound to the case 7 — by continuity. 

Finally, wc again use continuity to explain why we can assume that all eigenvalues 
(in fact, we only need fa and fa+i) are simple and make faam > and thus B > 
without changing anything. Let us list all B-dependent terms, in addition to all 
participating eigenvalues, in method (2.1): fJ-(x) and x'] and in bound (1.3): n{x) and 
n(x'). All these terms depend on B continuously if B is slightly perturbed into B t 
with some e — > 0, so we increase arbitrarily small the diagonal entries of the matrix 
B to make all eigenvalues of B € simple and /i m i n > 0. If we prove bound (1.3) for 
the matrix B f with simple positive eigenvalues, and show that the bound is sharp as 
< 

fania — * with e — ► 0, we take the limit e — > and by continuity extend the result 
to the limit matrix B > with fi m i n = and possibly multiple eigenvalues. □ 
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It is convenient to rewrite (1.1)— (1.3) equivalcntly by Theorem 2.1 as follows 1 

(2.1) fi(x)x' = Bx-(I-T)(Bx-n(x)x), /i(x) = ( x > Bx ) 

{x,x) 

(2.2) ||/-T||< 7 , 0< 7 <1; 
and if fa+i < n{x) < fa and n(x') < fa then fi(x') > fi(x) and 

/ on fa-fa[x') 2 fa-fx(x) fa -fa +1 fa +1 
( 2 -3) —tts <cr —— , ct = 1-(1-7) = 7 +(l- 7 ) . 

Now we establish the validity and sharpness of bound (2.3) assuming (2.1) and (2.2). 

Theorem 2.2. Let us define 2 7 (x) = arcsin (7||-Bx — /i(a;)x||/||Bar||) , then 
4>-y(x) < 7r/2 and /.{x',Bx} < 4> 1 {x). Let w ^ be defined as the vector constrained 
by Z{u>, Bx} < <p 1 {x) and with the smallest value n{w). Then fi(x') > /i(u>) > n{x). 

Proof. Orthogonality [x, Bx — fj,(x)x) = by the Pythagorean theorem implies 
||-Bx|| 2 = ||/Lt(x)x|| 2 + \\Bx - /j.(x)x\\ 2 , so \\Bx — fi(x)x\\ < \\Bx\\, since fj,(x) > as 
B > 0, and sin Z{x, Bx} = sin0i(x) = \\Bx — /i(x)a;||/||Ba;|| < 1, where Bx ^ as 
B > 0. A ball with the radius <y\\Bx-n(x)x\\ > ||/-T||||Ba;-/i(a;)x|| by (2.2) centered 
at Bx contains fi(x)x' by (2.1), so sin Z{x', Bx} < j\\Bx — fj,(x)x\\/\\Bx\\ < 7 < 1. 

The statement /Lt(x') > [i(w) follows directly from the definition of w. Now, 

(x,Bx) \(w,Bx)\ (w^Bw^jx.Bx) 1 / 2 

< 5 11 = cos 0i (a;) < cosZ{w,Bx} = < 5 n 

a; \\\\Bx\\ \w\\Bx\ \w\\Bx\ 



as B > 0, so y/ fi(x) < yj n(w) and (i(x) < fx(w). □ 

We denote by C^, m(Bx) := {y : Z{y,Bx} < 7 (x)} the circular cone around 
Bx with the opening angle 7 (x). Theorem 2.2 replaces x 1 with the minimizer w of 
the Rayleigh quotient on the cone C^, ^{Bx) in the rest of the paper, except at the 
end of the proof of Theorem 2.7, where we show that bounding below the value n(w) 
instead of /x(x') still gives the sharp estimate. 

Later on, in the proof of Theorem 2.4, we use an argument that holds easily only 
in the real space, so we need the following last simplification. 

Theorem 2.3. Without loss of generality we can consider only the real case. 

Proof. The key observation is that for our positive diagonal matrix B the Rayleigh 
quotient depends evidently only on the absolute values of the vector components, 
i.e., /i(x) = /i(|x|), where the absolute value operation is applied component- wise. 
Moreover, \\Bx — fi(x)x\\ = \\B\x\ — /^(|x|)|x||| and ||-Bx|| = ||S|x|||, so </> 7 (x) = 7 (|x|). 
The cone ( x \(Bx) lives in the complex space, but we also need its substitute in 
the real space. Let us introduce the notation (\ x \){B\x\) for the real circular cone 
with the opening angle 7 (|x|) centered at the real vector B\x\. Next we show that in 
the real space we have the inclusion \C^{x){Bx)\ C ^ x ^(B\x\). 

For any complex nonzero vectors x and y, we have (y, Bx)\ < (\y\,B\x\) by 
the triangle inequality, thus Z{|y|,i?|x|} < /L{y,Bx}. If y G m(Bx) then 
^{\y\, B \ x \} < ^{y,Bx} < (f>-,(x) = 7 (M), i.e., indeed, \y\ g C^ (N) (B|x|), which 
means that (C^^^x)! C q x ^(B\x\) as required. 



-"^Here and below || • || denotes the Euclidean vector norm, i.e., ||a;|| 2 = (x,x) = x H x for a real or 
complex column-vector x, as well as the corresponding induced matrix norm. 
2 We define angles in [0, tt/2] between vectors by cosZ{x,j/} = (x, y)\/{\\x\\ 
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Fig. 2.1. The cone Cj, ( x -)(Bx). 

Therefore, changing the given vector x to take its absolute value |x| and replac- 
ing the complex cone C^^{Bx) with the real cone ^ x ^(B\x\) lead to the re- 
lations mm yeC ^ ){Bx) [i(y) = ™H|„| e | ^ w( fl x )| MM) > min |i/|eC* (W) (B|x|) ti\v\l 
but does not affect the starting Rayleigh quotient = This proves the the- 

orem with the exception of the issue of whether the sharpness in the real case implies 
the sharpness in the complex case; see the end of the proof of Theorem 2.7. □ 

Theorem 2.4. We have w G dC^ t x \(Bx) and 3a = a 1 (x) > — ^ such that 
(B + al)w = Bx. The inclusion x G spanjxi, Xi+i} implies w G sp&n{xi, a;,+i}. 

Proof. Assuming that w is strictly inside the cone C^^( x )(Bx) implies that w is a 
point of a local minimum of the Rayleigh quotient. The Rayleigh quotient has only 
one local (and global) minimum, ^, mm , but the possibility fJ,(w) = ^t m i n is eliminated 
by Theorem 2.2, so we obtain a contradiction, thus w G dC$ ( x )(Bx). 

The necessary condition for a local minimum of a smooth real-valued function on 
a smooth surface in a real vector space is that the gradient of the function is orthog- 
onal to the surface at the point of the minimum and directed inwards. In our case, 
C<p^( x ){Bx) is a circular cone with the axis Bx and the gradient V^t(uj) is positively 
proportional to Bw — n(w)w; see Figure 2.4. We first scale the vector w such that 
(Bx — w,w) = so that the vector Bx — w is an inward normal vector for dC^, t x \ (Bx) 
at the point w. This inward normal vector must be positively proportional to the gra- 
dient, [3(Bx — w) = Bw — fj,(w)w with /3 > 0, which gives (B + al)w = (3Bx, where 
a = (3 — (J,(w) > —fi(w) > —/J,i. Here [3 ^ as otherwise w would be an eigenvec- 
tor, but fi(x) < (J.(w) < fJ.(x') by Theorem 2.2, where by assumptions /Uj+i < n(x), 
while fJ.(x') < fii by Theorem 2.1, which gives a contradiction. As the scaling of the 
minimizcr is irrelevant, we denote w/ (3 here by w with a slight local notation abuse. 

Finally, since (B + al)w = Bx, inclusion x G spanjxi, Xi + \\ gives either the 
required inclusion w G span-fa^, xi+i} or w G span{:Tj, Xi+i, Xj} with a = for 
some j 7^ i and j ^ i + 1. We now show that the latter leads to a contradiction. 
We have just proved that a > — /ij, thus j > i + 1. Let x = cixi + a+\Xi+\, 
where we notice that a ^ and Cj+i ^ since x is not an eigenvector. Then we 
obtain w = aiCiXi + OLi+xCi+xXi+i + CjXj where (B — fJ>j)w = Bx, therefore a,k = 
l^k/(^k — Mj), k = i, i + 1. Since all eigenvalues are simple, Hi+i ^ Hj. We observe 
that < a,i < a,+i, i.e., in the mapping of x to w the coefficient in front of Xi changes 
by a smaller absolute value compared to the change in the coefficient in front of JCi+i. 
Thus, fx(x) > n(aiCiXi + ai+iCi+iXi+i) > fj.(w) using the monotonicity of the Rayleigh 
quotient in the absolute values of the coefficients of the eigenvector expansion of its 
argument, which contradicts fJ.(w) > n(x) proved in Theorem 2.2. □ 
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Fig. 2.2. The Rayleigh quotient gradient flow integration on the unit ball. 



Theorem 2.4 characterizes the minimizer w of the Rayleigh quotient on the cone 
C</> (x)(Bx) for a fixed x. The next goal is to vary x, preserving its Rayleigh quo- 
tient /i(x), and to determine conditions on x leading to the smallest (i(w) in such a 
setting. Intuition suggests (and we give the exact formulation and the proof later in 
Theorem 2.6) that the poorest convergence of a gradient method corresponds to the 
smallest norm of the gradient, so in the next theorem we analyze the behavior of the 
gradient ||V/x(x)|| of the Rayleigh quotient and the cone opening angle <^ 7 (x). 

Theorem 2.5. Let k G (jH+ii fJ-i) be fixed and the level set of the Rayleigh quotient 
be denoted by C(k) := {i ^ : A*(aO = ^} . Both || V/i(x)|| ||x|| and 4>\{x) — </> 7 (x) with 
< 7 < 1 attain their minima on x € in spanjx,;, x^+i}. 

Proof. By definition of the gradient, || V/i(x)|| ||x|| = 2\\Bx — kx||/||x|| for x G C(k). 
The Temple inequality \\Bx — kx|| 2 /||x|| 2 > (/i,; — k)(k — /J;+i) is equivalent to the 
operator inequality (B — fiiI)(B — (J-i+iI) > 0, which evidently holds. The equality 
here is attained only for x £ spanjxi, Xi+i}. 

Finally, we turn our attention to the angles. For x 6 the Pythagorean 

theorem ||-Bx|| 2 = ||kx|| 2 + \\Bx — kx\\ 2 shows that 

2 _ \\Bx-kx\\ 2 \\Bx-KxW>/\\xf 

\\Bx\\ 2 n 2 + \\Bx- nx\\ 2 /\\x\\ 2 K ' ' 

is minimized together with ||i?x — kx||/||x||. But for a fixed 7 £ (0, 1) the function 
arcsin(a) — arcsin(7a) is strictly increasing in a G (0, 1) which proves the proposition 
for 4>\{x) — <fi~f(x) = arcsin(a) — arcsin(7a). □ 

Now we are ready to show that the same subspace spanjxi, x^+i} gives the small- 
est change in the Rayleigh quotient /i(w) — n. The proof is based on analyzing the 
negative normalized gradient flow of the Rayleigh quotient. 

Theorem 2.6. Under the assumptions of Theorems 2.4 and 2.5 we denote 
7 7 (k) := {w : w G arg min ^(C^ ( X )(-Bx)); x G C(k)} — the set of minimizers of 
the Rayleigh quotient. Then argmin /i(/ 7 (K)) G spanjx'i, Xi + i}. (See Figure 2.2). 

Proof. The initial value problem for a gradient flow of the Rayleigh quotient, 



(2.4) 
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has the vector-valued solution y(t), which preserves the norm of the initial vector w 
since d\\y(t)\\ 2 / dt = 2(y(t), y'{t)) = as (y, V/x(y)) = 0. Without loss of generality we 
assume \\w\\ = 1 = ||y(t)||. The Rayleigh quotient function fi(y(t)) is decreasing since 

j t »(y(t)) = (Vriy(t)),y'(t)) = (Vm(^)),-^||^) = < o. 

As fi(y(0)) = < Mi, the function fi(y(t)) is strictly decreasing at least until it 

reaches k > /ij+i as there are no eigenvalues in the interval [k, /j,(y(0))] C (Mi+i,Mi)> 
but only eigenvectors can be special points of ODE (2.4). The condition fi(y(t)) = k 
thus uniquely determines t for a given initial value w. The absolute value of the 
decrease of the Rayleigh quotient along the path L := {y(t), < t < t} is 

fi(w)-K = ii(y(p))-n(y(d)= f \\Vfi(y(t))\\dt>0. 

Jo 

Our continuation method (2.4) using the normalized gradient flow is nonstandard, 
but its advantage is that it gives the following simple expression for the length of L, 
Length(L) = /* \\y'(t)\\dt = fi ldt = 1 _ 

Since the initial value w is determined by x, we compare a generic x with the 
special choice x = x* € spanjx^, Xj+i}, using the superscript * to denote all quantities 
corresponding to the choice x = x* . By Theorem 2.4 x* £ spanjx^, x^+i} implies 
w* € spanjxi, Xi+i}, so we have y*(t) s spanjxi, x^+i}, < t < i* as span{xi, x^+i} 
is an invariant subspace for the gradient of the Rayleigh quotient. At the end points, 
fj,(y(i)) = k = m(x) = fj,(y*(t*)), by their definition. Our goal is to bound the 
initial value fJb{w*) = fi(y*(Q)) by m( w ) = /i(y(0)), so we compare the lengths of the 
corresponding paths L* and L and the norms of the gradients along these paths. 

We start with the lengths. We obtain <fii(x*) — <f> 1 (x*) < <f>i(x) — </> 7 (x) by 
Theorem 2.5. Here the angle 4>i{x) — 7 (x) is the smallest angle between any two 
vectors on the cones boundaries dC^^^ (Bx) and dC ( p 1 ^(Bx). Thus, <j)i{x) — <j) 1 {x) < 
Z{y(0), y(t)} as our one vector y(0) = w 6 dC^^{Bx) by Theorem 2.4, while the 
other vector y(t) cannot be inside the cone C^^^^Bx) since fi(w) > k = )jL{y{t)) by 
Theorem 2.2. As y(t) is a unit vector, Z{y(0),y(t)} < Length(L) = t as the angle is 
the length of the arc — the shortest curve from y(0) to y(t) on the unit ball. 

For our special ^-choice, inequalities from the previous paragraph turn into equal- 
ities, as y*(t) is in the intersection of the unit ball and the subspace span{x;, Xi+i}, so 
the path L* is the arc between y*(Q) to y*(i*) itself. Combining everything together, 

t* = Lcngth(L*) = Z{y*(0),y*(i*)} = Z{w*,x*} = ^(x*) - ^(x*) 
< ^i(x) - ip 7 (x) < Z{y(0),y(t)} < Length(L) = t. 

By Theorem 2.5 on the norms of the gradient, -|| Vfi(y*{t*))\\ > -|| Vfx(y(t))\\ 
for each pair of independent variables t* and t such that /j,(y*(t*)) = [i(y(t)). Using 
Theorem 3.1, we conclude that m( w *) = n(y*(0)) < n{y(i - t*)) < ^(y(0)) = fi(w) as 
i—t* > 0, i.e., the subspace spanjx^, x^+i} gives the smallest value fj,(w). □ 

By Theorem 2.6 the poorest convergence is attained with x € span{xi, x^+i} and 
with the corresponding minimizer w 6 spanjx^, Xi+\} described in Theorem 2.4, so 
finally our analysis is now reduced to the two-dimensional space spanjxi, Xi+i}. 

Theorem 2.7. Bound (2.3) holds and is sharp for x € spanjxi, x; + i}. 

Proof. Assuming j|x|| = 1 and ||xi|| = ||x i+ i|| = 1, we derive 

(2.5) |(x,x,)| 2 = > and |(x,x m )| 2 = tilUM, 

Mi — Mi+i Mi — Mi+i 
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and similarly for w G spanja^, x i+ i} where {B + al)w = Bx. 

Since B > 0, we have x = (I + aB~ 1 )w. Assuming a = — m+i, this identity 
implies x = Xi, which contradicts our assumption that x is not an eigenvector. For 
a 7^ — /ii+i and a > — /ij by Theorem 2.4, the inverse (B + a/) -1 exists. 

Next we prove that a > and that it is a strictly decreasing function of k := 
fj,(x) G (Mi+i> Mi)- Indeed, using = (£? + al)w and our cosine-based definition of 
the angles, we have < (w, (B + al)w) 2 = (w, Bx) 2 = ||w|| 2 ||i?a;|| 2 cos 2 7 (x), where 



\\Bx\\ 2 cos 2 ^(x) = \\Bx\\ 



•7 



Bx- 



(B + al) 



1 Bx, which 

2\ 



■ kx\\ . We substitute w 
gives ((S + aI)~ 1 Bx 1 Bx) 2 = + a»I) _1 .B;r|| 2 (||i?x|[ 2 - y*\\Bx - kx\\ 2 ) . Using 
(2.5), multiplication by (/z, + a) 2 (/i;+i + a) 2 leads to a simple quadratic equation, 
aa 2 +ba + c = 0, a = 7 2 (k(^ +A*i+i) -ViVi+i), b = 2^ 2 K^ i+ll c = -(1 -7 2 )A*?Mi+i 
for a. As a > 0, 6 > 0, and c < 0, the discriminant is positive and the two solutions 
for a, corresponding to the minimum and maximum of the Rayleigh quotient on 
C^( x ){Bx), have different signs. The proof of Theorem 2.4 analyzes the direction 
of the gradient of the Rayleigh quotient to conclude that [3 > and a > — /i(u>) 
correspond to the minimum. Repeating the same arguments with (3 < shows that 
a < —fj,(w) corresponds to the maximum. But fi(w) > since B > 0, hence the 
negative a corresponds to the maximum and thus the positive a corresponds to the 
minimum. We observe that the coefficients a > and b > arc evidently increasing 
functions of k G (Mi+i, fJ>i), while c < does not depend on k. Thus a > is strictly 
decreasing in k, and taking k — > fii gives the smallest a = — 7V7 > 0. 

Since (B + al)w = Bx where now a > 0, condition [x, Xi) 7^ implies (w,Xi) 7^ 
and (x,Xi+i) = implies (w,Xi+i) = 0, so we introduce the convergence factor as 



a 2 (a) :- 



Mt - P( w ) MO) - Mi 



-1 



(w,x l+1 ) 



(w,Xi) 



{x, x^ ) 



i+1) 



where we use (2.5) and again (B + al)w = Bx. We notice that <r(a) is a strictly 
decreasing function of a > and thus takes its largest value for a = /i,+i(l — 7)/7 
giving a = 7 + (1 — j)fXi+\/fii, i.e., bound (2.3) that we are seeking. 

The convergence factor o 2 {a) cannot be improved without introducing extra 
terms or assumptions. But <J 2 {a) deals with w € ^(Bx), not with the actual iter- 
ate x' . We now show that for n £ (fJ-i+i, fJ-i) there exist a vector x G spanjx.;, and 
a preconditioner T satisfying (2.2) such that k = [i{x) and x' G span{u>} in both real 
and complex cases. In the complex case, let us choose x such that fJ.(x) = k and x = \x\ 
according to (2.5), then the real vector w = \w\ G ^ x ){Bx) is a minimizer of the 
Rayleigh quotient on C^^^^Bx), since [i(w) = At(|u>|) an( l K^-^M) — (\ w \i B\x\). 

Finally, for a real x with /Lt(x) = k and a real properly scaled y G r x ^{Bx) 
there is a real matrix T satisfying (2.2) such that y = Bx — (I — T)(Bx — kx), which 
leads to (2.1) with fi(x)x' = y. Indeed, for the chosen x we scale y G i x \{Bx) such 
that (y, Bx — y)~0so \\Bx — y\\ = siri0 7 (x)||Bx|| = jWBx — kx\\. As vectors Bx — y 
and j(Bx — kx) are real and have the same length there exists a real Householder 
reflection H such that Bx — y = Hj(Bx — kx). Setting T = I — ~fH we obtain the 
required identity. Any Householder reflection is symmetric and has only two distinct 
eigenvalues ±1, so we conclude that T is real symmetric (and thus Hermitian in the 
complex case) and satisfies (2.2). □ 

3. Appendix. The integration of inverse functions theorem follows. 
Theorem 3.1. Let /, g : [0,6] — > R for b > be strictly monotone increasing 
smooth functions and suppose that for a G [0,6] we have f(a) = g(b). If for all 



8 



ANDREW KNYAZEV AND KLAUS NEYMEYR 



a, (3 G [0,6] with f(a) = g(/3) the derivatives satisfy f'(a) < g'(($), then for any 
£ £ [0, a] we have f(a - £) > .9(6 - £). 

Proof. For any £ S [0, a] we have (using f(a) = g{b)) 

ra(b) rg(b) 

£=/ (g~ 1 )(y)dy= (r 1 )(y)dy. 

If y = /(a) = then for the derivatives of the inverse functions it holds that 

(<? _1 )'(y) < Since / and g are strictly monotone increasing functions the 

integrands are positive functions and g(6 — £) < g(b) as well as /(a — £) < f(a) = g(b). 
Comparing the lower limits of the integrals gives the statement of the theorem. □ 

Conclusions. We present a new geometric approach to the convergence analysis 
of a preconditioned fixed-step gradient eigensolver which reduces the derivation of 
the convergence rate bound to a two-dimensional case. The main novelty is in the 
use of a continuation method for the gradient flow of the Raylcigh quotient to locate 
the two-dimensional subspace corresponding to the smallest change in the Rayleigh 
quotient and thus to the slowest convergence of the gradient eigensolver. 

An elegant and important result such as Theorem 1.1 should ideally have a 
textbook-level proof. We have been trying, unsuccessfully, to find such a proof for 
several years, so its existence remains an open problem. 
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